AITopics | generating image

French prosecutors suspect Musk encouraged deepfakes row to inflate X value

The Japan TimesMar-22-2026, 02:22:00 GMT

Elon Musk-owned X's Grok AI chatbot stirred outrage earlier this year over it generating images of naked women and girls without their consent. Paris - French prosecutors said Saturday they had alerted U.S. authorities to a suspicion that tech tycoon Elon Musk had encouraged controversy over sexualized deepfakes on X to artificially increase the value of his company. The social media network's Grok AI chatbot stirred outrage earlier this year over it generating images of naked women and girls without their consent. The controversy sparked by sexually explicit deepfakes generated by Grok (X's AI) may have been deliberately generated in order to artificially boost the value of companies X and xAI, the Paris prosecutor's office said, confirming a report in Le Monde newspaper on Friday. In a time of both misinformation and too much information, quality journalism is more crucial than ever. By subscribing, you can help us get the story right.

artificial intelligence, machine learning, natural language, (13 more...)

The Japan Times

Country:

North America > United States (1.00)
Asia > Middle East > Iran (0.47)
Asia > Taiwan (0.43)
(6 more...)

Industry:

Media > News (1.00)
Government > Regional Government > North America Government > United States Government (0.92)
Information Technology > Security & Privacy (0.84)
Law > Criminal Law (0.76)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.99)
Information Technology > Artificial Intelligence > Vision (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.84)

Add feedback

Generating Images with Multimodal Language Models

Neural Information Processing SystemsDec-24-2025, 22:08:45 GMT

We propose a method to fuse frozen text-only large language models (LLMs) with pre-trained image encoder and decoder models, by mapping between their embedding spaces. Our model demonstrates a wide suite of multimodal capabilities: image retrieval, novel image generation, and multimodal dialogue. Ours is the first approach capable of conditioning on arbitrarily interleaved image and text inputs to generate coherent image (and text) outputs. To achieve strong performance on image generation, we propose an efficient mapping network to ground the LLM to an off-the-shelf text-to-image generation model. This mapping network translates hidden representations of text into the embedding space of the visual models, enabling us to leverage the strong text representations of the LLM for visual outputs.

generating image, multimodal language model, name change, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Generating Images with Perceptual Similarity Metrics based on Deep Networks

Neural Information Processing SystemsNov-21-2025, 14:32:30 GMT

We propose a class of loss functions, which we call deep perceptual similarity metrics (DeePSiM), allowing to generate sharp high resolution images from compressed abstract representations. Instead of computing distances in the image space, we compute distances between image features extracted by deep neural networks. This metric reflects perceptual similarity of images much better and, thus, leads to better results. We demonstrate two examples of use cases of the proposed loss: (1) networks that invert the AlexNet convolutional network; (2) a modified version of a variational autoencoder that generates realistic high-resolution random images.

generating image, name change, perceptual similarity metric, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Evaluating and comparing gender bias across four text-to-image models

Hammad, Zoya, Sowah, Nii Longdon

arXiv.org Artificial IntelligenceSep-11-2025

SUMMARY As we increasingly use Artificial Intelligence (AI) in decision-making for industries like healthcare, finance, e-commerce, and even entertainment, it is crucial to also reflect on the ethical aspects of AI, for example the inclusivity and fairness of the information it provides. In this work, we aimed to evaluate different text-to-image AI models and compare the degree of gender bias they present. The evaluated models were Stable Diffusion XL (SDXL), Stable Diffusion Cascade (SC), DALL-E and Emu. We hypothesized that DALL-E and Stable Diffusion, which are comparatively older models, would exhibit a noticeable degree of gender bias towards men, while Emu, which was recently released by Meta AI, would have more balanced results. As hypothesized, we found that both Stable Diffusion models exhibit a noticeable degree of gender bias while Emu demonstrated more balanced results (i.e less gender bias). However, interestingly, Open AI's DALL-E exhibited almost opposite results, such that the ratio of women to men was significantly higher in most cases tested. Here, although we still observed a bias, the bias favored females over males. This bias may be explained by the fact that OpenAI changed the prompts at its backend, as observed during our experiment. We also observed that Emu from Meta AI utilized user information while generating images via WhatsApp. We also proposed some potential solutions to avoid such biases, including ensuring diversity across AI research teams and having diverse datasets. INTRODUCTION Artificial Intelligence (AI) has been growing remarkably in recent years, impacting numerous aspects of our daily lives. One such area of significant advancement is text-to-image generation.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2509.08004

Country: North America > United States (0.94)

Genre:

Research Report > New Finding (0.95)
Research Report > Experimental Study (0.94)

Industry:

Information Technology > Services (1.00)
Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Tackling fake images in cybersecurity -- Interpretation of a StyleGAN and lifting its black-box

Laubmann, Julia, Reschke, Johannes

arXiv.org Artificial IntelligenceJul-21-2025

--In today's digital age, concerns about the dangers of AI-generated images are increasingly common. One powerful tool in this domain is StyleGAN (style-based generative adversarial networks), a generative adversarial network capable of producing highly realistic synthetic faces. T o gain a deeper understanding of how such a model operates, this work focuses on analyzing the inner workings of StyleGAN's generator component. Key architectural elements and techniques, such as the Equalized Learning Rate, are explored in detail to shed light on the model's behavior . A StyleGAN model is trained using the PyT orch framework, enabling direct inspection of its learned weights. Through pruning, it is revealed that a significant number of these weights can be removed without drastically affecting the output, leading to reduced computational requirements. Moreover, the role of the latent vector - which heavily influences the appearance of the generated faces - is closely examined. Global alterations to this vector primarily affect aspects like color tones, while targeted changes to individual dimensions allow for precise manipulation of specific facial features. This ability to fine-tune visual traits is not only of academic interest but also highlights a serious ethical concern: the potential misuse of such technology. Malicious actors could exploit this capability to fabricate convincing fake identities, posing significant risks in the context of digital deception and cybercrime. In today's modern age, models to generate human faces are based on StyleGANs, which have been trained with the FFHQ (Flickr-Faces-High-Quality) image dataset [1].

artificial intelligence, latent vector, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2507.13722

Country: Europe > Germany (0.14)

Genre: Research Report (0.41)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Generating Images with Multimodal Language Models

Neural Information Processing SystemsMay-26-2025, 21:01:44 GMT

We propose a method to fuse frozen text-only large language models (LLMs) with pre-trained image encoder and decoder models, by mapping between their embedding spaces. Our model demonstrates a wide suite of multimodal capabilities: image retrieval, novel image generation, and multimodal dialogue. Ours is the first approach capable of conditioning on arbitrarily interleaved image and text inputs to generate coherent image (and text) outputs. To achieve strong performance on image generation, we propose an efficient mapping network to ground the LLM to an off-the-shelf text-to-image generation model. This mapping network translates hidden representations of text into the embedding space of the visual models, enabling us to leverage the strong text representations of the LLM for visual outputs.

large language model, multimodal language model, natural language, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Seeing Soundscapes: Audio-Visual Generation and Separation from Soundscapes Using Audio-Visual Separator

Kang, Minjae, Brandão, Martim

arXiv.org Artificial IntelligenceApr-28-2025

Recent audio-visual generative models have made substantial progress in generating images from audio. However, existing approaches focus on generating images from single-class audio and fail to generate images from mixed audio. To address this, we propose an Audio-Visual Generation and Separation model (AV-GAS) for generating images from soundscapes (mixed audio containing multiple classes). Our contribution is threefold: First, we propose a new challenge in the audio-visual generation task, which is to generate an image given a multi-class audio input, and we propose a method that solves this task using an audio-visual separator. Second, we introduce a new audio-visual separation task, which involves generating separate images for each class present in a mixed audio input. Lastly, we propose new evaluation metrics for the audio-visual generation task: Class Representation Score (CRS) and a modified R@K. Our model is trained and evaluated on the VGGSound dataset. We show that our method outperforms the state-of-the-art, achieving 7% higher CRS and 4% higher R@2* in generating plausible images with mixed audio.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2504.18283

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)

Add feedback

Reviews: Generating Images with Perceptual Similarity Metrics based on Deep Networks

Neural Information Processing SystemsJan-20-2025, 09:41:31 GMT

I think the most important contribution of the manuscript is to describe a method that substantially improves image reconstruction from compressed deep network representations (e.g. In that regard I would have liked an analysis of the compression rate for the reconstructions from the different feature spaces. In particular because the difference in quality between layer conv5 and fc6 doesn't seem too large, whereas there is a 10-fold reduction in dimensionality ((13 x 13 x 256 43k vs. to 4k) in the feature representation. One factor that should definitely be discussed in the paper is that it appears as if the adversarial prior enforces to use image data from the training set in the reconstruction. This is not necessarily a problem in terms of image compression, but is an important factor: a careful choice of training data might be important depending on what type of images one wants to compress.

generator, reconstruction, representation, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.99)
Information Technology > Artificial Intelligence > Machine Learning (0.96)

Add feedback

How Do Generative Models Draw a Software Engineer? A Case Study on Stable Diffusion Bias

Fadahunsi, Tosin, d'Aloisio, Giordano, Di Marco, Antinisca, Sarro, Federica

arXiv.org Artificial IntelligenceJan-15-2025

Generative models are nowadays widely used to generate graphical content used for multiple purposes, e.g. web, art, advertisement. However, it has been shown that the images generated by these models could reinforce societal biases already existing in specific contexts. In this paper, we focus on understanding if this is the case when one generates images related to various software engineering tasks. In fact, the Software Engineering (SE) community is not immune from gender and ethnicity disparities, which could be amplified by the use of these models. Hence, if used without consciousness, artificially generated images could reinforce these biases in the SE domain. Specifically, we perform an extensive empirical evaluation of the gender and ethnicity bias exposed by three versions of the Stable Diffusion (SD) model (a very popular open-source text-to-image model) - SD 2, SD XL, and SD 3 - towards SE tasks. We obtain 6,720 images by feeding each model with two sets of prompts describing different software-related tasks: one set includes the Software Engineer keyword, and one set does not include any specification of the person performing the task. Next, we evaluate the gender and ethnicity disparities in the generated images. Results show how all models are significantly biased towards male figures when representing software engineers. On the contrary, while SD 2 and SD XL are strongly biased towards White figures, SD 3 is slightly more biased towards Asian figures. Nevertheless, all models significantly under-represent Black and Arab figures, regardless of the prompt style used. The results of our analysis highlight severe concerns about adopting those models to generate content for SE tasks and open the field for future research on bias mitigation in this context.

machine learning, natural language, prompt style, (18 more...)

arXiv.org Artificial Intelligence

2501.09014

Country:

Europe > United Kingdom (0.46)
North America > United States (0.28)

Genre: Research Report > New Finding (0.88)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Generating Images with Multimodal Language Models

Neural Information Processing SystemsJan-14-2025, 09:03:32 GMT

We propose a method to fuse frozen text-only large language models (LLMs) with pre-trained image encoder and decoder models, by mapping between their embedding spaces. Our model demonstrates a wide suite of multimodal capabilities: image retrieval, novel image generation, and multimodal dialogue. Ours is the first approach capable of conditioning on arbitrarily interleaved image and text inputs to generate coherent image (and text) outputs. To achieve strong performance on image generation, we propose an efficient mapping network to ground the LLM to an off-the-shelf text-to-image generation model. This mapping network translates hidden representations of text into the embedding space of the visual models, enabling us to leverage the strong text representations of the LLM for visual outputs.

generating image, generation model, multimodal language model, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Filters

Collaborating Authors

generating image

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

French prosecutors suspect Musk encouraged deepfakes row to inflate X value

Generating Images with Multimodal Language Models

Generating Images with Perceptual Similarity Metrics based on Deep Networks

Evaluating and comparing gender bias across four text-to-image models

Tackling fake images in cybersecurity -- Interpretation of a StyleGAN and lifting its black-box

Generating Images with Multimodal Language Models

Seeing Soundscapes: Audio-Visual Generation and Separation from Soundscapes Using Audio-Visual Separator

Reviews: Generating Images with Perceptual Similarity Metrics based on Deep Networks

How Do Generative Models Draw a Software Engineer? A Case Study on Stable Diffusion Bias

Generating Images with Multimodal Language Models